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ABSTRACT 


To many critics and researchers, semantic meagerness 
is the main limitation of relational DBMS. The burden 
placed on users to remember attribute names and their 
domains is discussed and difficulties associated with the 
lack of set as a type for an attribute 1s examined. The 
paper explores the implications to high-level query 
languages necessitated by a set-type attribute. The 
allowance of semantically improper joins by DBMS is 
studied as 1s the lack of strong data type checking. The 
semantic shortcomings of system-chosen access paths is 
discussed. These problems are followed by recommended 


solutions. 


nia 


THES TS bi Sere Trier 


The views expressed in this thesis are those of the 
author and do not reflect the official “oolicy “Gou 
position of the Department of Defense or the U.S. 


Government. 


le 


Re 


it . 


bY 


Testes b> CONTENTS 
INTRODUCTION 
BATASGGSE CONCEPTS ™. . . 54 


A. DEFINITION OF A DATABASE SYSTEM 
B. COMPONENTS OF A DATABASE SYSTEM 


lee Data ye Gee co a es ee ce 

wis Hardware ... . 

Sis Software 

4. Users see esr ees o> eo ele 
C. ADVANTAGES AND DISADVANTAGES OF DATABASE 

BeeceSsiNGs : 

ia Redundancy ecan 26. Reduces 

ce. Avoidance of Inconsistency 

ahs Shared Data es 

4. Enforcements of St amdince a ne , 

2. Application of Security Restrictions 

6. Data Integrity : 

7. Balancing of Sayre re Pena 


RELATIONAL DATABASE MODEL ..... . 
A. RELATIONAL DATA STRUCTURE 


des Bef iniclomeeh eaeeelation = as < « 
ae Keys . ; Dealers 
cee Extensions sere tt natiaae 


Is RELATIONAL ALGEBRA a 
Peeeaditional Operators =<. . « «= = 


a. Union we x 
ae Difference Se ee ae eee 
eee Intersection se oo eases 


ai. ProOgue Creager °.¢ ee ee 
en special Peneciem ot. Aparato 


a. Selection 
aie Projection 
Ge Ufelsa 


C. RELATIONAL DATA MANIPULATION LANGUAGES 


ADVANTAGES OF RELATIONAL APPROACH 

A. INTRODUCTION : 

B. RELATIONAL DBMS ADVANTAGES | 

Fase of Understanding 

Data Independence ...... .- 

Power and Ease of Use .... . 
Theoretical Foundation SG. Ree ne 


fu We 


THE RELATIONAL APPROACH: PROBLEMS AND SOLUTIONS 
A. INTRODUCTION Sw er MED fui.) 2c is et 
BeeptewtoeroaND VYIEWS =. « « © »s «© «© w » « 

fee Problems : 

ee Sembee) oe) 6. CeCe tes oe ees. lel 


Ooodwyn os + 


10 


Op 

ie 
ice 
ie 
eS 
ies 


=) 
1 
16 
fee 
24 
eae 
eo 
ay 
ae 
a7 
28 
es 
eg 
oF 
30 
Se 


34 
34 
34 
34 
i 
ay 
3c 


hy 
eo 
GO 
40 
41 


Wali 


SET-TYPE ATTREBGtes 


QUERY LANGUAGE MODIFICATIONS 


ie Problems 
e. Solutions 
Te Problems 
ee Solutions 
JOINS : 
i Problems 
eae Solutions 
TYPE CHECKING 
oe Problems 
ee Solutions 
ACCESS PATHS 
ie Problems 
es Solutions 
CONCLUSIONS 2. ie 


LIST OF REPEGENEB@S wee 


INITIAL DISTRIBUTIGN Elsa 


4D 
45 
74 
6&4 
54 
&7 
69 
eke 
Ta 

Ja 
Tes 
Te 
72 
7a 
80 


84 


86 


88 


mEQure 
ei1gure 


Figure 
Figure 
Figure 
Figure 
Figure 


Figure 
Figure 
Figure 
Figure 
Figure 
Figure 
Figure 
Figure 
Figure 
Figure 
Figure 


WN Ww WwW 


noWnUunUnUon Uw 


on Of Wf 


-OpoVNnuUeruney 


© 8 


> 


PSO Perens 


Simple View of Database System . 


Student, Teacher and Student/Teacher 


Relations le a ee A 
Example of Cartesian EO t 
Domains and Attributes ... . 


Selection of Courses Relation 
Projection of Teacher Relation 


Natural Join of Student and Teacher 


Relations @ver District 


Diy Si Omeibemes t Om aes os 6s CS 
Command View feacher ..... .- 
Command View Teacher.District 5 
Relation School Ae Oe meee ee ee : 
Relation School with Set-Type Attribute 
Relation Titles an ce ene - —— 
Relation Titlese .. . ; 

User Invisible Relation ue. es 
Relations Teachers and ee 

Re lotirom olan cS, .<.° s+ ear ee 


Relations with A Sao aR Attribute 


14 


i? 
1s; 
20 
eq 
30 


31 

32 
44 
GG 
4& 
4b 
58 
£0 
5 
70 
2k; 
81 





i. INTRODUCTION 


Database technology is one of the most rapidly 
Growing fields in computer science. Its popularity 
among corporations and government agencies within the 
last ten years is staggering. Database systems on 
personal computers is now commonplace. 

Database denotes collections of data shared by end- 
users of computer systems. lie is the most modern 
technique of data storage hence its popularity and 
importance. Decision-makers within an organization make 
decisions by accessing the database. Obviously; the ease 
of access, securitys and integrity of the database is 
extremely important. 

Database systems are distinguished from other systems 
by certain features. The following may be considered as 
constituting the major features of a database system 
Peer. i:p. 611: 

- performance optimization 
- concurrent usage 
- data protection 


- data independence 


flexibility of data structure. 
A software program, known as the database management 
system (DBMS) manages the database. tie DBMS controls 


the storing and retrieval of data, and the users 


themselves. DBMS have also facilitated the development 
of many database applications (computer applications 
where many users at terminals concurrently access a 
database). 

In the coming years, database systems will become 
increasingly important. With the cost of labor steadily 
increasing and the cost of computers decreasing, people 
are being replaced by computers. CRef. e:p. ij] There is 


strong reason to believe tiie CeonGeW ieee pemee. 


There are currently three commonly implemented 
database models: 1) relational database, €) hierarchical 
Gatabase, and 3) network database. Relational DBMSs are 


based on a strong theoretical foundation which is in 
contrast to the network structure which was borrowed from 
telecommunications and the hierarchical structure which 
was borrowed from bill Of material systems. (Ref. 3:p. 
96] Other features which attract users to relational DBMS 
are its ease of use; data independence, and table data 
Structure: Relational DBMS have only recently begun to 
Gain widespread popularity in the commercial environment. 
The relational system has been touted as the system of 
the future. Whether or not it lives up to this billing 
remains to be seen. The primary goal of this thesis 1s 
to present problems and shortcomings of DBMS based on the 
relational model and to provide recommendations and 


solutions to these problems. 


Chapter II describes the basic concepts of a 
database. The chapter gives a definition of a database 
system, describes the components of a database system, 


and lists the advantages and disadvantages of database 


processing. Chapter Ie describes the relational 
database model. A relation is defined and common 
relational database model terms are explained. The 


traditional relational algebra operators are detailed as 
are the special relational operators. Finally; 
relational data manipulation languages are discussed. 

Chapter IV describes the advantages of the relational 
approach to database management systems. Advantages such 


as ease of understanding and data independence, as well 


as others, are outlined. Chapter V examines the problems 
of the relational approach to database management 
systems: Semantic burdens on the user, the lack of set- 
type attributes, query language modifications required 


by set-type attributes, semantically improper joins, the 
lack of strong data type checking, and access paths. The 
problems are detailed and followed By recommended 
solutions. Finally, conclusions are presented in Chapter 


a 


IIT. DATABASE CONCEPTS 


A. DEFINITION QF A DATABASE SYSTEM 

A database system is a system whose overall purpose 
is to maintain information and to make that information 
available on demand. The information 1s whatever the 
individual or organization deems to be significant to the 
entity being served. In short, information is anything 
required by the individual or organization to help in the 
process of making decisions. C[Ref. 4:p. 4] A database 
may also be considered a collection of facts ora 
repository for stored data which is both integrated and 
shared. Integrated means the database may be considered 
a consolidation of several otherwise discrete data files. 
Any redundancy amongst the files 1s either fully or 
partially eliminated. Shared means that separate pieces 
of data in the database may be shared among many 
different users. Each user may Rave access to the same 
piece of data and different users may use that data for 
different purposes. Different users may also access the 
data at the same time, known as "concurrent access". 
CRE. MoD oo 

The definition can be summarized with the following 
points: 

- A database 15 a generalized compilation of data. 
- This compilation is integrated to reduce redundancy. 


dy 


- Data structure 1s based on natural data relationships 


which provides all necessary access paths. The 
required access path to aunit of data is really a 
result of its relationship to other data. The 


ability to represent the natural data relationships 
with all the necessary access path is the essence of 
the distinction between a database and a conventional 
file. 


- A database must supply the differing data needs of 
Users in an efficient and effective manner. [{Ref. 


mp. OJ 
The importance of database systems to today’s 
Organizations cannot be understated. A database 15S a 


shared resource, thus its design and use must be managed 
with all the users in mind. The difference between sound 
decisions and poor decisions many times rests with the 
Quality of information in the database. 

The person who controls the database In an 
Organization is known as the Database Administrator 
Seba). This position is critical because this person (or 
Group of persons) assumes responsibility for protecting 
the database while at the same time attempting to 
Maximize benefits to users. fRefso <eg:u. ocGa!l The DBA’s 
responsibilities include the following: 

-~ determines the information content of the database 

- determines the storage structure and access strategy 
- provides a liaison with users 

- specifies security and integrity checks 

-~ defines backup and recovery strategy 


- monitors performance and responds to changing 
Beoqguimemnents. CLRef. G:pp. ecs-e7] 


The final piece of the puzzle is the database 
management system (DBMS). The DBMS is the software that 
manages a database; the word "management" may be 
interpreted to include the functions of creation and 
maintenance. (CRef. i pieos A full scale DBMS provides 
the following capabilities: 

- storage; update; and retrieval of data 


- a Catalog accessible to users which provides data 
Gcesecerption 


- transaction support to ensure that all or none of a 
series of database changes are reflected in the 


relevant databases 


- recovery services in the event of a failure (system 
Or program) 


=- \COnNGUT EeMmey COnmteaod services to ensure that 
concurrent transactions function the same way as if 
having been run in some sequential order 

- authorization services to ensure that access to and 
manipulation of data 165 1n accordance with defined 


constraints on users and programs 


- integration that includes support TOT data 
communication 


~- integrity services that ensure database states and 
changes in this state conform to specified rules. 

(Ref. S:p. 114) 

The database system provides the organization with 
centralized control of its operational date. (Ref. 4 ae 
9] Without a database system; an organization 1s subject 
to a wide range of private files interfacing with 


applications. Control of one of any organization’s most 


important assets, the operational Gata, 1S shaky. THis 


may prove to be very hazardous to the corporation’s well- 


being. 


B. COMPONENTS OF A DATABASE SYSTEM 

A database system consists of four major components: 
Gata, hardware, software, and users. Figure @€.1 shows an 
example of the arrangement of a simplistic system. 

1. Data 

As stated earlier; a database 1s a collection of 

integrated files. Kroenke states that “a database is a2 
collection of files and relationships among records in 
Gmose files." tRef. e:p. 11] 


Holding true to the lexicon in the computer 


industry, bits are grouped into bytes (8 bits = 1 byte) 
Or characters, characters are grouped into fields; and 
fields are grouped into records. A collection of records 


tUseeealled a file. €CRef. e€:p. 11] 

Database processing differs significantly from 
file processing where each file 1S considered to exist 
independently and the Stmucture of the files 1s 
distributed across application programs. 

For tutorial purposes; we will assume there 15s 
just one database, containing the totality of all stored 
data in the system although normally the system is split 


into one Or more databases. 


e. Hardware 
There 1S no special hardware needed for database 
systems. Hardware consists of device controllers, 
input/output channels, secondary storage devices (disks; 
drums, etc.) on which the database resides, together with 
associated devices. It is assumed primary storage will 
not be large enough store the entire database. 


Database applications often require extensive 


resources (1.e€.; larger main memory, faster central 
processing unit; and more direct access storage). This 
can be quite expensive. Database processing also 


involves special programs and overhead data. 

Special purpose computers that perform database 
processing functions, called database machines, were 
announced by several vendors in 198e {Ref. e:p. 8] 
According to this type of architecture, the computer 
processing the application program sends requests for 
service and data over a channel to the database machine. 


The machine processes the requests and sends results; 


data, or messages back to the main computer. In tinge 
manner; database processing can be performed in a 
concurrent manner with applications processing. The 


effectiveness of database machines remains to be seen. 
Bie Software 
The layer of software between the physical 


database Gare ares the data as actually stored) and the 


users of the system 1S known as the database management 
system (DBMS). Requests from users to interact with the 
database are processed by the DBMS. The DBMS acts asa 
safeguard between the users and hardware level details. 

The operating system (QS) iS a4 Program which 
controls the computer’s resources thus relieving users 
Ste umis burden. Operating system programs cause tasks 
to be performed and may be considered the nucleus of all 
the other programs. 

The Communications Control Program (CCP) performs 
communication-oriented tasks. It provides communications 
error checking (and correction if errors are found), 
coordinates terminal activity; sends messages to their 
proper destination, and formats messages for various 
types of terminal equipment. 

Application programs (AP ) are computer 
applications where many users at terminals concurrently 
access a database and are tailored to specific business 
needs. Specific needs such as order entrys inventory 
meeountings and billing are satisfied. [Ref. e:p. 9] 

4%. Users 

Three broad classes of users interface with the 
database system: application programmers, end-users and 
the database administrator (DBA). 

The application programmer is responsible for 


writing application programs that use the database. 


Typically written ina language such es EUBGL cere se 
application programs are used with data for retrieving 
information, creating new information; and deleting or 
changing existing information. Such functions are 


performed by the DBMS after it receives the appropriate 


request. The programs themselves may be conventional 
batch applications Oi onaaa ne applications which 
fume tonsa e support an end-user who accesses the 


database from an on-line terminal. 

An end-user may employ a query language provided 
as a composite part of the system to perform the 
TONE toe n= of retrieval, creation; deletion; and 
modificatiom@eateerdata: The alternative 1s for the end- 
user to utilize one of the on-line application programs 
that accepts commands from the terminal and then issues 
requests to the DBMS on the end-user ’s behalf. 

The database administrator (DBA) mentioned 
earlier 1s the person (or group of persons) responsible 
for the control of the database system. The DBA staff 
serves as a guardian of the database and as a focal point 


for resolving users’ conflicts. 


C. ADVANTAGES AND DISADVANTAGES OF DATABASE PROCESSING 
As mentioned earlier, an advantage of a database 
system 1s Chet. elie provides the organization with 
centralized control of its operational data [Ref. 4:p. 
9]. Also, database processing allows more information to 


10 


be yielded from a given amount of date. it oOGMarl on 1s 
then gained by processing these recorded facts-~ and 
figures, the data. Centralized control of the 
operational data provides the following advantages [Ref. 
feo. 1O-1el). 
1. Redundancy Can Be Reduced 

One important advantage of database processing is 
the elimination or reduction of data duplication. In 
conventional file processing systems, each application 
has its own private files. This may lead to considerable 
redundancy in stored data therefore wasting storage 
space. 

Elimination of duplicated information saves file 
Space and maey reduce processing requirements. It should 
be noted that all redundancy should not necessarily be 
eliminated. However; redundancy that does exist must be 
Seeametully monitored. CRef. ec:pp. 3-6) 

Ce. Avoidance of Inconsistency 

This follows closely with the above point. eT: 

two different entries in the database represent a single 


fact about the real world, there will undoubtedly be 


occasions where the two entries do not agree (1.e.; when 
only one entry has been properly updated). At such times 
the database is said to be inconsistent. A database in 


an inconsistent state may supply users with incorrect or 


Swit teacimageimrormatiom Which leads to user distrust of 
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computer generated output. If redundancy 1s controlled, 
the system can ensure that the database will never be 
inconsistent in the eyes of the user, by guaranteeing 


that any change made to either of the two entries is 


automatically applied to the other one also. Tih ae 
process 1S known as "propagating updates." (CRef. eC:pp. 3- 
6] 


3. Shared Data 
As discussed earlier, sharing means existing as 
well as new applications that are developed can Share the 
Gata in the database. Users have access to the same data 
and different users may use the data in different ways 
ana foredifferent purposes. EReT. 22ppe 3-6) 
G. Enforcements of Standards 
Because the data 1s)6ccentralized (vice private 
files for each application), the DBA 1s able to ensure 
that all pertinent standards are observed with respect to 
the representation of the data. Applicable standards may 
include any or ali of the following: organization, 
state, division, industrys national, and international 
standards. Standardizing helps in data migration 
between systems. CRef. eC:pp. 3-6] 
S. Application of Security Restrictions 
With complete control over the database system; 
the DBA can ensure the database may be accessed only 


through the proper channels therefore requiring security 


ie 


checks to be carried out whenever access to classified 
Gata is attempted. It should be noted that the very 
nature of a centralized database system requires a sound 
Seeurity system be in place. (Ref. @:pp. 3-63 
5. Data Integrity 

Data integrity deals with the problem of 
Quaranteeing the data in the database is accurate. There 
is a lack of data integrity if two entries that represent 


the same fact are inconsistent (which can only occur if 


redundancy exists in the stored data). Because the 
database is shared (unlike private files), data integrity 
is of extreme importance. Most current database products 
are weak in this area. A common result from this 
shortcoming 16 EG Vetaemc reports by corporation 


employees who use these database products. (Ref. C:pp. 
ao | 
Pemereatameing of Conflicting Reports 
Because the database system 1S Shared by the 
whole orGanization as opposed to just individuals, the 
DBA must structure the system to provide service that is 


in the best interests of that organization. (Ref. C:pp. 


a6) 

There are many disadvantages that come along with 
databases. A major disadvantage of database processing 
Is that it can be very expensive. DEMS @iare “not 


Inexpensive, and because of itS main memory requirements, 


ee 


additional memory may Have to be purchased. The system 
may also gain exclusive control of the Central Processing 
Unit (CPU) forcing the user to upgrade to a more powerful 
COMPUTE. 

Data processing tends to be very complex. The 
database system and application programs must be able to 
process large amounts of data that are interrelated in 
different formats. This results in sophisticated 
programming and requires highly trained programming and 
maintenance personnel. Backup and recovery of data also 
increases system complexity and is difficult to carry out 
in the database environment. 

Another disadvantage is that integration; and 
hence centralization; increases vulnerability. The 
entire system may fail if one component fails. 
Obviously; this becomes extremely critical if the users’ 
organization depends heavily upon the database for its 


day-to-day operations, [Ref e7 pp sos 7. 


US Bie 


USER 





APPLICATION SOFTWARE 





USER 


Figure 2.1. Simple View of Database System 
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TTI. RELATIONAL DATABASE MODEL 


eee RELATIONAL DATA STRUCTURE 

ie celational model was first proposed by Dr. cE. F. 
SeeG im i970. Only recently has the model moved from the 
world of theoretical interest to that of practical 
Significance. This was the result of the announcement of 
several important relational DBMS products (1.@.5 


Sel/DS, ORACLE, etc.). 


A relation is a mathematical term Tor a two- 
dimensional table. It is characterized by rows and 
Sarumns that contain data item values. It is called a 


relation and not a matrix due to a lack of Homogeneity in 
its entries; the entries are homogeneous in the columns, 
Sle mot in the rows. A relational database 1s composed 
of such relations which may be stored on a physical 
@GeviGcGe 1n a variety of ways. [Ref. l:o9. 130) 

To explain the relational data structure, sample data 
in relational form will be very helpful. Figure 3.1 
shows a relational view of data which 1S organized into 
three tables: SRUDE Nise TESGEHERs and STUDENT TEACHER 
(ST). The STUDENT table contains, for each student, a 
student identification number, a student name, the year 
in school of that student, and the district where the 
student attends school. The TEACHER table contains; for 
each teacher, a teacher identification number, a course 


Le 


name, a class size, the credit hours for the course, and 
the teaching Gdrete tec: As in the STUDENT relation; we 
assume each teacher haS a unique identification number. 
The STUDENT -fEAGHER table contains a student 
identification number, a teacher identification number, 
and a tenure value. 
1s Definition of.a Relaticen 

Given a collection of sets D1,De;,...;,;Dn Gane 
necessarily distinct) R is relation on those n sets if it 
is a set of ordered n-tuples <di,de,...;dn> such that di 
belongs to D1, de belongs to De;...; dm belongs town 
Sets D1, Des, ...,;Dn are the domains of R. The value n is 
the degree of R. Relations of degree 1 are called unary; 
degree e:binery, degree S3:ternery,;, and degree n:in-ary. 
LRef «Sips 7a) 

An equivalent definition of a relation can be 
Given from a mathematical set-theory perspective. ~ 


relation is any subset of the Cartesian product of one or 


more domains. For exampie, if we have n setss Say n = @; 
Al = {a,b and AC = en al Sie then Ai x Ae is the 
Cartesian product of these n sets. It is the set of all 


possible ordered n-tuples <ai,ae@> such that al belongs to 
Ai and ae belongs to Ae. The result of Al xX Ae 15 
€(a,si)d),(a,e2),(a,3),(6b,1),(b,2)5,(b,3)}. Figure 3.2 shows 
the Cartesian product of two sets SID and TID (student 


and teacher identif igatrensnumee 
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leh, 





moeciiite S.1. Student, feacher and Student/Teacher 
Relations 
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Si ale 2 UD TID 
Se ie Si aha 
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Se ia: Sa ee. 
Se T1 
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Se ike 

Sa yl 

Ss Te 

Sa ibe 
Figure 3.@. Example ef Cartestaner, camer 


A relation called STUDENT of degree four is 
1llustrated in Figure 3.i(a). The four domains are sets 
of values representing student identification number 
(SID), student name (SN), the year in school of each 
student (YR), and the district of the school (DISTRICT) 
respectively. The YR domain; for example, is the set of 
all valid year numbers. Note that there may be some year 
numbers included in this domain that do not actually 
appear in the STUDENT relation at this particular time 
(relations are time-varying). 

A relation may be viewed as a table, where each 


row 1S called a tuple (corresponding to records) and each 


column represents a field within the record. The columns 
are called attributes. The number of tuples in a 
relation is called the cardinality of that relation. For 
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example, the cardinality of the STUDENT relation is three 
Sea it has four attributes (columns). 

A domain may be thought of as a bank of values 
from which are drawn the actual values for a given 
attribute. Keep in mind that arelation 15 a set and 


sets are not ordered. There is no ordering defined among 


the tuples of a relation. However 5 the domains of a 
relation do have an ordering among’ them. If we have a 
tuple (al,ac,y.-.;an) with n components, the value of the 


fee component in this n-tuple must be pulled from the j 
th domain. Beet. 4+ pecoe)] Simp rPigure 3.2Ca), the third 
tuple of the STUDENT relation is (S3, VENTURA, 3, SAL). 
ime Value of the third field of this tuple under the 
attribute YR is drawn from the third domain all positive 
integers. In mathematical terms, rearranging the four 
columns results ina different relation. 

It 15 important to understand the difference 
between a domain CRef. 4:p. 85] and attributes which are 
drawn from that domain. Figure 3.3 shows a4 part of a 
relational schema in which four domains (STUDENT_ID, 
STU NAME, YEAR, and LOCATION) and one relation (STUDENT) 
are defined using a data definition languege. The 
relation is declared with four attributes (SID, SN, YR; 
ang DISTRICT), and each attribute is designated as being 
pulled from a corresponding domain. It may be the case 


that more than one attribute has the same domain. Data 
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Values from attributes having the same domain are what 
allow the relational system to associate tuples from 


Gdifrerent 2telatwoanc. 


DOMAIN STUDENT TP ALPHANUMERIC (2) 
DOMAIN STU_ NAME CHARACTER (267 
DOMAIN YEAR NUMERIC (1) 
DOMAIN LOCATION CHARACTER CHS) 


RELATION STUDENT 


ee >: DOMAIN STUDENT_ID 
SN >: DOMAIN STU_NAME 
da: : DOMAIN YEAR 
DISTRICT : DOMAIN LOCATION) 
Figure 3.3. Domains and Attributes 
Relational databases must be composed of 


relations which meet the following condition: 


Every value in the relation--i1.e.;, each attribute 
value in each tuple--is atomic (1.e., nondecomposable 
so far as the system is concerned). CRef. 4:p. 86] 


That 1S, every data value in the relation is precisely 


one value. A set of data values is not allowed within 
the relation. Null Values are allowed to represent 
values ina relation that are not known. When a relation 
meets the above criterias, it is Ssald to be normalized. 


This will be discussed in greater detail when the 
implementation of set-types within a relation 1S 
discussed. 

The format used to represent a relation is called 


the relational structure. For example, STUDENT(SID,s SN; 


a9, 


YR, DISTRICT) describes one Saeie ture —Gaeetine o1UDENT 


relation. The generic format that describes the 
relational structure may be said to be 
Relation_name(attributel, attributee,...attributen). hae 


constraints are imposed on the data values which may 
exist in the relational structure; then a relational 
schema exists. 
ce. Keys 

When an attribute with unique values within a 
relation is used to identify tuples within that relation, 
that attribute is said to be akey. A person’s social 
security number 1s an attribute that uniquely identifies 
that person. Within the STUDENT relation, the attribute 
S20), is the key. Its unique values distinguish each 
tuple within the relation. 


A key may be the composition of more than one 


eeetrwibute. For example, Pometiiemw te: atiom Si =<Fiqure 
feet c)); the combination of the attributes SID and TID 
serves as the key. A unique tuple within the relation 


could not be identified if this were not the case. If 
the key were SID or TID by itself (and not the 
combination of the two); more than one tuple would have 
the same key. In the above example, the combination of 
STD and TID is said to be a composite key. The 
combination also acts as the primary key for eines | 


relation. ea =e\(e eile be remembered that a relation 1s a 


a4 


set, with tuples being elements of the set. Because sets 
do not contain duplicate elements, each tuple of a given 
relation 1S unique in the relation. This property 
Quarantees that the combination of all the attributes 
will act to uniquely identify a single tuple within the 
relation. 

Relations may exist where more than one attribute 
combination 1s able to uniquely identify a tuple within a 
relation. This relation 1s said to Rave more than one 
Candidate key. Our STUDENT relation is such a relation. 
Tuples (or records) may be uniquely identified within the 
STUDENT relation by either the value of the SID attribute 
or by the value of the SN attribute. This 1s assuming 
there are never any duplicates in either of the two 
columns. We may arbitrarily choose either of the 
candidates to act as the primary key. An alternate key 
1S a candidate key that is not the primary key. In the 
STUDENT relation SID may act as the primary key and SN 
would act as the alternate key. 

Tuples are uniquely identified by their primary 
keys depict entities in the real world. Therefore, 
Primary keys serve to uniquely identify entities. Within 
the STUDENT relation, the tuples represent individual 
students. The values of the SID attribute identify those 


students rather than just the tuples that represent 


ee 


them. As an offshoot of this interpretation; we 
mecnoduce the following rule. 
Mieor iy RULE isXemtity integrity? 

No component of a primary key value may be null. 

pets 4:0. 89 J). 

This rule states that all entities must be uniquely 
identifiable in some way. Primary keys serve as the 
unique identification function ina relational database. 
If a primary key were to Rave a null value in a relation, 
the entity would be without a unique identification 
property. tins eWoula=  pronibit Ciliates pal ticular entity 
from being distinguishable from other entities within the 
moat ion. It 1s recommended that both whole and partial 
gee identifiers be prohibited. ECRef. 4:p. 89) 

Many times one relation contains references to 
another relation. Relation SIT; for example, includes 
references to both the STUDENT and TEACHER relations. 
These references are the attributes SID and TID contained 
ete, If a tuple of ST has a value for SID; say $1, 
then a tuple should exist within the STUDENT relation for 
student 5$1. If this were not the case, the tuple 
containing $1 in the ST relation would refer to a student 
that does not exist. 

To further elucidate matters, we must understand 
tmemecomeenat of a primary domain. Date states; "A given 
domain may optionally be designated as primary if and 


only if there exists some single-attribute primary key 
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defined on that domain" CRef. 4:9). S95) By adding ave 
its definition (from that shown in Figure 3.3); we are 
able to specify the domain STUDENT_ID as primary as 
ToOllows. DOMAIN STUDENT_ID ALPHANUMERIC (2) 
PRIMARY 
A relation must abide by the following rule if that 
relation includes an attribute that is defined on the 
Primary domain (for example, relation ST). 
INTEGRITY RULE e (Referential Integrity) 


Let D be a primary domain; and let Ri be a 
relatiomw Wiltiheoveoeum tant. A that is defined on 


D. Then, at any given time, each value of A in 
Ri must be either (a) null, or (b) equal to V, 
Say where V 1S the primary key value of some 
tuple in some relation Re (Ri and Re not 
necessarily distinct) with primary key defined on 
Ds CRef. 4:p. 89] 
The. definition eo jae Primary domain implies 
relation Re must exist. If attribute A is the primary 


key of R1, the rule is trivially satisfied. 

Finally, we introduce the notion of a foreign 
key. When an attribute in one relation is the key of 
another relation, the attribute is called a foreign key. 
For example, attribute TID of relation ST 1s a foreign 
key because its values are values of the primary key of 
the TEACHER relation. 

3. Extensions and Intensions 
An extension and intension are components of a 


relation in a relational database. 
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The set of tuples existing in the relation at any 
Qiven instant is known as the extension of that relation. 
Obviously, the extension is time-dependent. The 
extension varies as changes are made or operations are 
performed on tuples (1.e€.; update, delete, create, etc.). 

On the other hand; the intension of a relation is 
time-independent. The intension directly relates to what 
is specified in the relational schema. Hence, the 
intension is the combination of the structure of the 
relation (the naming structure) and integrity 
constraints. CRef. 4:p. 90] The naming structure is the 
name of the relation and the names of the attributes 
with their respective domain names. The integrity 
constraints are divided ime key constraints, 
referential constraints,s and other constraints. 

The existence of candidate keys implies key 
constraints. The attribute(s) which make up the primary 
key and the attribute(s) which make up the alternate keys 
if any exist, are specified in the intension. tH 
uniqueness restraint iS implied by these specifications. 
Also, the Se wman y key specification infers, in 
accordance with Integrity Rule #1; a no-nulls constraint. 
PRet. G:p. 91) 

The existence of foreign keys implies referential 
constraints. A specification of all foreign keys in the 


relation implies a referential constraint. The relations 


es 


in Figure 3.1 are examples of extensions. They also show 


the naming structure (relation and attribute names). 


B. RELATIONAL ALGEBRA 

Relational algebra defines operations that work on 
relations. Operators manipulate relations to achieve a 
desired result. It 15 very important to note that the 
result of any operation ona relation creates in another 
relation. Relational algebra 15 Said to be procedural, 
which means that the user must not only know what he 
wants when performing operations ona relation, but also 
how to get it. As previously mentioned, relations are 
set with the tuples of each relation considered elements 
of the set. Therefore any operations that can be 
performed on sets may be performed on relations which 
results in a new relation. 

Although relational algebra is not often used, it 1s 
important to understand as it 1s the backbone of other 
high-level query languages such as SQL and QUEL. 

The relational algebra may be said to consist of two 
Groups of operators. 

1. Traditional Operators 

The traditional set operators are unions 
difference, intersection; and product which are discussed 
in greater detail below. CRef. 4:pp. 2O3-ellilRet. 22: eer 


eSe-250) 
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ae Uniram 

Combining the tuples from one relation with 
those of a second relation to produce a third relation is 
known as the union of two relations. To make sense; the 
combined relations must be union compatible. Union 
compatible means each relation must have the same number 
of attributes; and the attributes in matching columns 
must come from the same domain. For example, if one 
relation (relation A) is the set of students living in 
Monterey; and another relation (relation B) is the set of 
students who play tennis; then A UNION B would be the set 
of students (relation OC) who live in Monterey or play 
tennis (or both). It should be noted that duplicate 
tuples are eliminated. 

ome Difference 

A third relations (relation C) that consists 
of tuples which occur in relation A but not in relation B 
is Said to be the difference of relations A and B. For 
example, using the same relations used in the UNION 
example, A MINUS B is the set of students who live in 
Monterey and who do not play tennis. Note the relations 


must be union compatible. 


ex. Intersection 
Sac Niram relation ™ (Trelatiom CC) -containing 
common tuples between two other relations (relations A 
and B) is Said to be the intersection of relations A and 


e7 


B. For example, again using the relations used in the 
above examples; A INTERSECT B is the set of students who 
live in Monterey and play tennis. Agains the relations 
must be union compatible. 

ds Produvet 

The concatenation of every tuple of relation 
A with every tuple of relation B (resulting in relation 
C) 1s known as the Cartesian product of relations A and 
Bi If relation A has m tuples and relation B has n 
tuples, then their product has m x on tuples. This 
product is represented by A X B or A TIMES B. FOig 
example, the students living in Monterey TIMES a student 
identification number relation is the set or relation of 
all possible student JU at nye’ iin Monterey/student 
identification number pairs. 
au Special Relational Operators 

The special relational operators are selection, 
HroOjections joins “ana divicien. 

aa select iron 

The selection operator outputs a horizontal 

subset (row) of a given relation. A tuple 1s now 
included in the new relation. A tuple may be extracted 
from a relation by specifying the relation name followed 
by the keyword WHERE followed Dy a conditional statement. 
This conditional statement involving attributes is a 


Single expression or may involve a combination of Boolean 
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expressions. Figure 3.4(a) shows the selection of the 
metetiaon STUDENT WHERE SN = "ROSE". Figure 3.4(b) shows 
Meemae lection of STUDENT WHERE YR o> 1. Figtvre Sire) 
shows the selection of STUDENT WHERE YR < 3 AND DISTRICT 


= "SAL". 





(a) 


(BD) 






DAS RET 





Figure 3.4. Selection of Courses Relation 


Sree TO }Ce€t1on 
Specified attributes are selected from a 
heretiom BY the projection operation (projection draws 
out columns from a relation). The operator outputs a 
vertical subset (column) of a given relation. Any 
duplicate tuples within the attributes selected are 
eliminated. An attribute may only be specified once in a 


a”, 


projection operation. The operator may also be used to 
change the arrangement of the attributes of a relation 
because the attributes are specified in a left-to-right 
order. Figure 3.5 shows the projection of TEACHER on the 
COURSE and CREDIT attributes. The projection is denoted 
by TEACHER CCREDIT, COURSE] (relation name followed by 
attributes to be projected in brackets). Notice that the 
order of the attributes from the original relation is 
changed due to the order they appear in the brackets. 
Duplicates would have been eliminated had they been 


present. 












Figure 3.9. Proejectioneot | eachensmetarlon 


Ee. Join 
RA combination of product, selection, and 
sometimes projection 1S known aS aA join. A JOIN B 
denotes a join between relations A and B. To join two 
relations in ae natural jOlin, there are three steps. 
First take the Cartesian product of the two relations; 
then do a selection to eliminate some tuples. Then if 


desired, remove duplicate attributes through projection. 
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If duplicate attributes are not removed it is called an 
equijoin. When the term join 1s used, it 1s understood 
to mean a natural join. The natural join of the STUDENT 
and TEACHER relation over the attribute DISTRICT appears 


Mer igure 3.46. 


re] Pa] srsrnicr| ve] cause] eaves] oreo 
Ea 
Ea 
ee [see (eee foe 


prouUne 3.6. Natural Join of Student and Teacher 
Relations Over District 






ae Division 

If the domain of a iunary relation is also a 
domain of a binary relation; then we may divide the binary 
Semeatian by the unary relation. This results in a unary 
relation consisting of the uncommon domain from the binary 
relation. The resultant relation contains an attribute 
value of the uncommon domain if its corresponding entries 
in the common domain contain all the values of the divisor 
domain. (Ref. Leer Ou Figure 3.7 shows’ an example of 
Sem an operation. The relation TEACH acts as the dividend 
emo relation LOCATION is the divisor. Mie@m~pesciit is 1D (1D 


= TEACH/DISTRICT). Iie: the figure, the “only teacher 


ou 


identification number CT 1D)? pe) & which there is 


corresponding values of both SAL and CML in TEACH is Te. 


LOCATION itjB; 


DISTRVEs 


(eG. 






= 
20) 





Figure 3.7. Division Operation 


ee RELATIONAL DATA MANIPULATION LANGUAGES 

Data Manipulation Languages (DMLS) are languages 
through which relational databases may be accessed. There 
ae ae Gibne main methods for manipulating data in a 
relational database. These are: relational algebra, 
relational calculus; transform-oriented languages; and 


Qraphic metnods lRetf. eC:ppe eae seas. 

Relational algebra uses standard set operators to 
achieve the desired result: a new relation. It is 
procedural (the user must not only know what he wants but 
also how to get it) and therefore difficult to use. 

Relational calculus is non-procedural (the user only 
tells the system what he wants, not how to get rite The 
concept was first proposed by Codd who also presented a 
language based on this calculus; Data Sublanguage ALPHA. 


Be 


however ; this language was never implemented. CRef. 4:p. 


ees) 

Transform-oriented languages are nom—Onrocedur al 
languages. The most popular of these languages is 
Structured Query Language (SQL). SQL provides retrieval 
and update facilities as well as many others. Major S@QL- 


based DBMS products include S@QL/DS, System R;, and ORACLE. 
Meer. 2:0. 4373) 

QUEL is based on tuple relational calculus and 1s very 
memiiayr to SQL. It 16S non-procedural so the user does not 
need to concern himself with the underlying physical data 
Semuc ture. 

Query-by-example (QBE) is a Gea vealiemethod.° to 
access a database. In this method the user 1s presented 
with a picture of the structure of the relation. This 16 
Suitable only for terminal use and may not be embedded in @e 


host language. 


ere: 


IV. ADVANTAGES OF RELATIONAL APPROACH 


Ai INTRODUCTION 

As mentioned earlier, relational database management 
systems are enjoying widespread popularity. According 
to industry observers; it will be the most prevalent DBMS 
technology for most data-processing applications. Its 
understandability; data independence, power and ease of 
USE, and theoretical foundation are the earmarks of 
relational DBMS. The distinctive features of DBMS are 


described below. 


B. RELATIONAL DBMS ADVANTAGES 
1. Ease of Understanding 

One of the main motivations for the research work 
that resulted in the relational model was what Codd 
termed the "communicability objective”. (Ref. S:p. fee 
Achieving this objective means users and programmers 
alike are able to communicate with one another about the 
Gatabase because they both have a common understanding of 
the data; the model is structurally simple. 

The relational DBMS, using tables with rows and 
columns which are universally understandable, provides a 
logical view of data. This means that the database 
concept is more easily understood to Many users as 


opposed eta) those who understand the hierarchical or 
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network based models. Navigation through tables is not 
necessary because there are no pointers connecting the 
tables. This 1s in contrast to the hierarchical and 
network models where one must maintain a position where 
one 1S working when performing most operations. Tt 
should be noted that while users may logically view the 
data in the database as a collection of tables, it is 
rarely stored as such in physical memory. Thus, the user 
mentally sees the database as being; in all cases; a 
collection of tables or files with one record type per 
table. 
2. Data Independence 

Data independence 1S a very important feature of 
any database system. If an application is data 
dependent, then the storage structure and accessing 
strategy cannot be changed without having a pronounced 
effect on the application. Data independence means 
applications need not worry about how the data 156 
accessed or physically stored. Codd considered the date 
independence objective as the most important motivation 
for the research which spawned the relational model. 
meet. O26. 1103 

Under the relational model, data 1s accessed by 
its value in tables, not by its lEEatron. Likewise; 


tables are related by value; not by pointers connecting 


eS 


the tables. Also; the system determines access paths, 
not the program. 

Two reasons why applications must be data 
indepencdemt are [trRet. 4!:p.) Vea. 


- Different applications need different views of the 
same date. 


- The DBA must not be forced to change existing 
applications if he desires to change; due to changing 
requirements, the storage structure or access 
strategy vor beth) = 

Data independence greatly affects both end-users 
and programmers. From the end-users viewpoint, no 
knowledge of the physical database is required for 
access; data ordering in tables 1S insignificant. 
Therefore, requests for data may be non-procedural. This 
lack of having to know the underlying data structure 
combined with the ability to just let the system know 
what the user wants (and not also how to get it) enables 
users with kieet le computer/database experience to 
immediately interact with the system. 

From the programmers vantage point, data 
independence also offers advantages. Many hours are 
spent making changes to existing application programs. 
For example, if relational queries are embedded in a host 
language application program and the nature of the data 
changes, thus the current path to the data 1s no longer 


optimal and the relational query language program only 


needs to be recompiled. CRef. 6:p. 93] However, if the 
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Same situation occurred in a CODASYL database, the 
program would have to be rewritten by someone - who 
understands the effects of the data change on the 
application program. The time saved due to the concept 
of data independence May be spent writing new 
applications for which there tends to be a shortage of. 
AS organizations grow, there are sure to be 

desired changes in data representation and growth in 
data types. Data independence affords organizations this 
luxury without inflicting damage upon their application 
programs. 

3. Power and Ease of Use 

Relational operations are powerful and easy to 

use because they enable users to operate or process 
multiple sets of records at a time ina single statement. 
Hence ; relational operations are less procedural. Beit 
example, in our TEACHER relation, the SQL statements 

Set CT mace 

FROM teacher 

WHERE aGistricw.. = =MLy” 
would operate on the entire relation (versus a record by 
record operation as is done conventionally. The 
advantage of being able to process whole tables of data 
at one time is obvious. Relations (or sets) are treated 


as operands in the relational approach. 


ce 


Less procedurality promotes increased 
productivity as the retrieval and update operation 
burdens are placed on the system. 

a. Theoretical Foundation 

Linlike the hierarchical or network structure,s the 
relational DBMS are based ona theoretical relational 
model. Relational systems used this model as a 
foundation while network and hierarchical DBMS have 
evolved over the years to accomplish as many requirements 
as possible. Because relational systems have a sound 
theoretical foundation, the results of relational 
operations are easily predictable. 

The next chapter examines deficiencies of the 
relational model and offer solutions to these 


shortcomings. 
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V. THE RELATIONAL APPROACH: PROBLEMS AND SOLUTIONS 


A. INTRODUCTION 

Despite growing popularity and the belief that 
relational DBMS will be the system of choice in the not- 
too-distant future, they are not without their faults. 
The performance of relational DBMS is not yet comparable 
to the standards set by the more familiar models (1.e., 
network and hierarchical). This performance degradation 
comes from heavy input/output rather than from pure 
number-crunching CRef. 7:p. 26] and is therefore less 
responsive to technological improvements. Relational 
systems require significant amounts of CPU and memory 
resources, and file (database) maintenance 15 very time 
consuming. Others claim relational DBMS are not well- 
suited in a transaction-heavy environment because of 
lengthy run-times, that the relational concept 15s 
unnecessarily complex, and that relational DBMS are very 
weak in the areas of database recovery and backup. 
However , to many database specialists and researchers, 
the main limitation of relational systems is their 
semantic meagerness. This chapter will focus on the 
Problems caused by this semantic meagerness and will 
offer solutions to these problems. The main goal will be 
on easing the formulation of queries and increasing the 
system’s semantics. 
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B. MENUS AND VIEWS 

Database technology was an esoteric subject only a 
few years ago. ae is now alg interest to add 
organizations. Relational database technology, always a 
favoGg il vemem academicians (probably because of its 
theoretical foundation) 5 is now seeing widespread 
Practical applications. Due to the rapid improvements in 
and importance of the technology, users with little 
computer experience and even less database experience 
must frequently interact with the organization’s 
database. 

Tee Problems 

To a naive seeker of information from the 

database via a high-level query interface, it is not at 
@ll obvious why one question may be asked and another may 
not be ~-asked (Ret... ‘Sip lca. The naive user expects 
certain queries to be answered such as: 


SELECT NAME 
WHERE job _ title = "manager' 


but does not understand why a query such as "How 15 Paul 
Smith related to the Personnel department?" cannot be 
answered (assuming of course the user is first able to 
formulate a query to retrieve such information). That 
the second query iS a query on meta-data and there does 
not exist a querying facility for database schema (as 
opposed to data values) does not occur to the user. Th1sS 
lack of familiarity and knowledge of the relations within 
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the database places a constraint on the user that causes 
him frustration and inefficiency while using the system. 
Im many government and commercial databases, 
relations of a high degree (thirty or more) are not 
Bmeommon tRef. 9:p. 380). Users should not be burdened 
with remembering the attributes of each relation (much 
less even lower-level details such as the format and 
units of the attributes). 
our Selutions 
We suggest the implementation of a command called 
MENU which allows the user to become familiar with the 
names of a relation’s attributes and its respective 
domains. The command is typed, followed by the name of 
the relation with which the user has questions about. 
For example, if the user types: 
MENU TEACHER 


The system would reply: 


Relation = TEACHER 

TID alphanumeric 
SOUR SE string 
SeSsice =e ah als 
CREDIT integer 


ON BS MPs Ee § string 


This command, at the user’s fingertips, provides a quick 
"explanation" of the relation or a refamiliarization for 
experienced users. This 1S more effective than a 
Hardcopy located in a manuel which may be lost, 


mutilated, not updated as the relations changes and which 
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is time-consuming and difficult to use (1.6.3 toecet ine 
the appropriate information). 

A reference attribute CRef. TO: pb. e€98) has sam 
entity occurrence as its value. If an attribute is a 
reference attribute (1.e., 1ts domain is another entity or 
relation) the system must indicate this. For example, if 
the domain of the DISTRICT attribute in our TEACHER 
relation was no longer just of type string but waS now a 
relation itself consisting of attributes MAYOR and 
POPULATION, the system should respond to the command MENU 
TEACHER with the following : 


Relation = TEACHER 
alphanumeric 


ERBURSE string 

EES Ze string 

ERED It integer 

DSR Ted MAYOR string 
POPULATION integer 


Lt the user was somewhat familiar with the 


TEACHER relation but was not quite sure of a particular 


attribute’s properties (such as its domain), the system 
Should allow the user CO access that particular 
attribute. A reference attribute and its attributes and 


their respective domains may be examined by entering a 
command in the familiar dot-notation. In this example 
the user may type "MENU TEACHER.DISTRICT."” The system 
would respond with 

AttTribuUtes=,DiStRiel 

MAYOR string 


POPULATION integer 


GC 


The system should allow such notation to any degree of 
nesting. Tt is obviously in the database administrator’s 
best interest to limit this degree of nesting to a 
reasonable level to provide the users with a less complex 
Gatabase. 

"The totality of data in ae data bank may be 
viewed as a collection of time-varying relations.’ (CRef. 
mee. 3/79) It 15 due to these changes in the make-up of 
the relations in the database that a command such as MENU 
end its extensions must be available. 

Another solution to the problem of users’ 
unfamiliarity with relations within the database would be 
to have a command that when called presents a graphical 
representation of a particular relation (humans remember 
Mictures very well). This command serves the same 
purpose as the MENU command mentioned earlier, but may 
Provide a more concrete image of the tabular data 
structure to the user. The actual data ina relation 


the user desires to View will Neewee ShnOWnM in order to 


prevent any possible unauthorized access. A skeleton of 
the relation, with the name of the relation, its 
attributes, and their domains 1s displayed. If an 


attribute is areference type, the skeleton would name 
the attribute and its associated attributes and their 
domains. This relation (or reference type attribute) 
within a relation should be able to be viewed by the user 
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through "relation reduction’; the techwnlaqtevet .e~-4ani nae 
relations within other relations. A command such as 
"VIEW TEACHER" would show (see Figure 5.1) the skeleton 


of the TEACHER relation. If the DISTRICT attribute wants 


TEACHER 


p TID GHURSE GESlZe CRE Die DISTRiIGH 


a SEPA? SSSR pS SS SYS SS SSS SSS PSSST, 


ALPHANUMERIC STRING STRING INTEGER; REFERENGS 


Figure 3.1. Command View Teacher 






to be examined, the command "VIEW TEACHER.DISTRICT" 

should present to the user the attribute name (in this 
case; DISTRICT); the name of the owner relations, and 
its associated attributes and domains (See Figure 3.2). 
The MENU and VIEW commands are very simple commands which 


Provide users with a fingertip tool to examine closer 






(TEACHER) 
Piste iet 
MAYOR POPULATION 
STRING PNTEeeER 
Figure 3.e. Command View Teacher.District 
each relation Within the database. These commands will 


enable users to Rave a better understanding of the 
semantics of each relation which will aid in query 


formulation. 
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Bee oe l-TYPE ATTRIBUTES 
As mentioned previously; a major limitation of the 
relational model is its semantic meagerness. This 
meagerness 1S exacerbated by the lack of a set as a type 
for an attribute. 
i. Problems 
The lack of a set-type restricts relational 
schemas from modeling in a complete and expressive way 
the real-world relationships between entities which may 
emamy Gistinguishable thing or object). (Ref. 10:p. 2864] 


An example will 1llustrate the problems of not 


having a set-type attribute with respect to 
expressiveness shortcomings. Normalized systems require 
a record to be a collection of single-valued facts. hy) 
other words, each data item within a record must be an 
atomic (nondecomposable) value. A relation where this is 
the case 15 said to be in first normal form (1NF). A 


more complicated data structure than a two dimensional 
table with homogeneous entries in the columns would be 
needed if the iNF requirement was not met. Figure 5.3 1s 
a relation we will call SCHOOL. If the relation wishes 
to depict the instructors each particular student has 


then a tuple must exist in the relation for-~ each 


matey labial gaacstructoOnaanat student has. This is called a 
nomalized relation. However, Figure 3.4 condenses this 
relation by relaxing the INF constraint. Whereas the 
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SERGE 





Figure 3.3. Relaticn Sehacz 






SCHOOL 


SOPH SMITH, JONES, DRAKE 


Figure 5.4. Relation School with Set-Type Attribute 





schema previously would have been SCHOOL (sid : INTEGER; 


VG : STR LNG: INStruetor : STRING) with a set-type 
implemented it becomes SCHOOL (sid ; INTEGER $5 yr 
STRING; instructor .: SET Gr ~Siki Nae The attribute 


INSTRUCTOR 1s now a set-type attribute. The set-type 
Saves space by reducing redundancy and truly allows the 
record to contain all the facts about an entity [CRef. 
Bo seas By retrieving this one record, the user will 
see the natural or true relationship between the student 
and his instructors. If the INSTRUCTOR attribute was not 
a set-type; a record that was retrieved would not tell 
the whole Stor y Chee ws the user may believe the 
instructor retrieved from one particular record was the 


students’ only instructor which may not be the case). a 


46 


lack of a set-type in addition to placing a restriction 
Siiea record from possibly containing all the facts about 
am entity creates other problems. Many real-world 
entities do not have single-valued information which can 
easily be represented within the field of a record (Ref. 
mep> 116). For examples a relation called TEAM may not 
happen to have any single-valued facts. The team may 
have numerous players; many games; several coaches; and a 
large number of potential recruits. A number of 
Meretions (i.e., {fEAM_PLAYERS, TEAM _COACHES, etc.) each 
containing an attribute 1S necessary to store the data 
rather than simply one relation called TEAM. 

Many changes have been proposed to enable the 
relational model to better capture the semantics between 
real-world entities. Some have proposed adding semantic 
Gata model capabilities to relational systems thus 
SombIining the advantages of both models [Ref. 10:p. e866]. 
Others have proposed extensions to the relational model, 
such as Codd’s extended relational model RM/T. 

However; relaxation of first normal form (1NF) 
constraints is the most common proposal. This proposal, 
in effect; would allow attributes to be of a set~-type. 
As mentioned earlier, the lack of a set-type inhibits the 
relational model’S expressivity. Another problem which 
arises from the lack of a set-type attribute 1S query 


formulation. Asserting that the lack of a set-type is a 
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main cause of user difficulties in formulating many 


Queries, focus will now be directed to this problem. 


To make the query formulation issue more 
transparent, we introduce four relations. The key of the 
relation 1s the attribute or set of attributes 


underlined. (Reta, tieto wee 
student(sname, district) 
studies(sname,s course) 
developed(tname;s course) 
teacher (tname, district) 

As stated earlier, due to the lack of a set-type 
some entities are inappropriate for storage ina single 
relation. The TEAM relation served as an example of 
this. With no single-valued attributes; a number of 
smaller degree relations would better represent the 
enti bv. 

For the same reason, in the above example we are 
forced to create a separate relation, STUDIES, which 
lists the courses a particular student 1s currently 
Ss tua yima. If a set-type been implemented, this data 
could have been stored in the STUDENT relation; giving 
the user a more concise; semantic view of the date. 

Why does the lack of a set-type make user query 
formuletion a more difficult process? In the above 
example, two relations are needed (STUDENT and STUDIES) 
where if a set-type were available, one would suffice. 
When a user attempts to formulate a query; he must qo 
through the mental gymmastics Of jolmimgunpelauronomte 
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result in the desired final relation from which the data 
he desires will be retrieved. This 15 not an easy task 
for users of any experience level. REGeEnt Griticism of 
relational systems has been the difficulty users have in 
understanding joins. Languages such as SQL and QUEL are 
manageable for programming but not for users$s$ and the 
reason 1S brought to light with joins (Ref. 1le:p. 70]. 

An example best demonstrates the Query 
formulation problem. Consider the following query in S@L 
rorm : 

SELECT student.sname, teacher.tname 
FROM student; teacher 
WHERE student.district = teacher.district 
This is obviously a very simple query. Only two 


relations are required to be joined. The equivalent 


relational algebra expression is : 


PROJECT snames tname (student iXi teacher) 


The projection retrieves the (sname,; tname) pair who live 
in the same district. 

for an experienced programmer or users the 
translation of "Find the names of the students and 
teachers who live in the same district" into a query 1s 
mores Veryverdifficult. For an inexperienced user, the 
mental manipulation required to answer this query 1s not 
Guite so easy but yet still manageable. If sname and 


tname been set-type attributes, the TEACHER and STUDENT 
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relations could have been combined into one relation in 
the beginning (a new relation called STUD_TEACH) and the 
Guery would have been reduced to: 


SELECT sname;, tname 
FROM stud_teach 
WHERE stud teach.district = X. 


As can be seen, the gain (query-wise) of having set-type 
attributes in this particular case 1S minimal. This will 
not be the case with more difficult queries. 

Let’s consider a more challenging query: 


SELECT sname 
FROM student 
WHERE district IN 
(SELECT district 
FROM teacher 
WHERE tname IN 
(SELECT tname 
FROM developed 


WHERE course = "math")). 

Translating this query into its Engler 
equivalent is not so easy. However s this cannot be 
attributed to SQL syntaxs which in this case; is very 
basic. Since any query one can formulate in SQL may also 


be formulated in QUEL, write the query in QUEL-like 
structure to see if this contributes to the semantic of 
the query 

SELECT student.sname 

ERoM Student; teacher, developed 

WHERE student.district = teacher. districe AND 


developed.tname = teacher .tname AND 
developed.course = "math". 
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This 1S equivalent to: 
PROJECT sname 
(PROJECT sname, tname (student i:X: teacher) 1X 
PROJECT tname (developed WHERE course = "math")). 

The first nested projection retrieves the (sname, 
tname) pair where sname and tname live in the same 
district. The second nested projection retrieves tname 
after it hes been selected where tname developed the math 
course. These two relations ere joined over the 
attribute tname, and sname 1S projected from this 
relation. 

This transformation to a QUEL-like query does 
little to help us with the semantics of the query. This 
1S because one still must visualize the query as a series 
of natural joins occurring between the relations over 
common attributes. As earlier expressed; humans do not 
eaSily digest joins, especially more than a couple teking 
place within the same query. The English translation, 
"List the students who live in the same district as the 
teacher who developed the math course” is difficult to 
formulate into a query because i1t requires joins of three 


separate relations. 
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One more example will further illuminate matters: 


SELECT sname 
FROM student 
WHERE Grstriet IN 
(SELECT “Girs Gmaile v 
FROM teacher 
WHERE tname IN 
(SELECT tname 
FROM developed 
WHERE course IN 
(SELECT course 
ily studies 
WHERE sname = student.sname))). 


This query 1S Stiliemore diftveuree Again, this 
is not due to the SQL syntax, but may be attributed to 
another situation where multiple joining of relations is 
required. The equivalent query in "QUEL-like” SQL 
structure rs: 

SELECT student.sname 


FROM student, studies, developed, teacher 
WHERE student.district = teacher .district AND 


student.sname = studies.sname AND 
studies.course = developed.course AND 
developed.tname = teacher.tname 


This 1S equivalent to the following relational algebra 


expression: 


PROJECT sname (PROJECT sname, tname (student :X; 
teacher ) 
[INTERSECT 
PROJECT sname, tname (developed 1X4 


studies). 

Here, the first nested projection retrieves the 
sname,;, tname pair where sname and tname live in the same 
Gl ty. The second projection retrieves the same pair 
where sname studies the course developed By tname. The 


intersection and subsequent projection results ina list 


ayes 


of the student names who live in the same district as the 
teachers who developed the courses they are studying. 


This query in relational algebra is semantically 


more clear to users as a set operators, namely 
intersection, appears in the expression. The difficult 
task of joining relations is "softened" by the presence 


of the intersect operator which serves to re-establish 
the set concept in the user’s mind. The concept of 
natural joins, much less a sequence of them, is not 
supported or strengthened By real-world occurrences with 
which users may identify. On the other hand, users 
naturally compartmentalize and willingly think in terms 
of sets but do not often mentally perform natural joins.) 
Thus the lack of a set-type, by requiring multiple 
joins to take place, forces users to think in terms of a 
physical database structure and a concept they cannot 
relate to the real world 0 lS as opposed to the 
natural concept of sets and set operations. 

The notion that each relation within the database 
is a set where each tuple (or record) within the relation 
is an element of that set is inconsistent within the 
user’s mind. When the user employs Basic set operators 
such as SELECT, he expects a list or set of xX to be 
retrieved from the relation specified. When a join (or 
multiple join) is performed, the user must mentally 


sequence through a series of steps that deal with lower- 


ay: 


level concepts. This "low-level traversal" causes the 
user to lay aside the thought of the set to be retrieved 
until the notions of common attributes; domains, etc. are 
dealt with. 
e. Seodeprons 
The proposed solution to the problem involves the 
obvious addition of a set-type to relational DBMS. Any 
attribute may be declared of type set within the database 
schema. The declaration 1s of the form : SET _OF (type or 
entity name). Our sample database schema may be defined 
as follows: 
student (name : STRING; Pes trier oe & >: SET OF develope 
GS tr ics 
SiR ENG? 
develop (name : STRING; course : STRING; classes : SET _OF 
study) 
study (name + STRING; course «= Sire 
teacher (name : STRING; district = SiIRING) fret oe 
The query "List the students who live in the same 
district as the teacher who developed the math course" 
may be programmed as follows: 
SEBEET cetudent 
WHERE student.instructor CONTAINS 
develop WHERE 
develop.course = "math" 


AND student.district = teacher.district 


Noticeably absent from this query is the presence 


of a join. Although more than one relation is involved 
in answering the query (student and develop) the user 
does not see the explicit 2 mar Instead, from the 


student relation we "placidly migrate” into the develop 


24 


melat1ion. This more delicate process 15 easier for users 
to see mentally. 

The query "List the students who live in the same 
district as the teachers who developed the course they 
study" may be expressed as follows: 

SELEGr student 
WHERE student.instructor CONTAINS 
develop WHERE 
develop.classes CONTAINS 
study WHERE 
study.course = course 
AND Student.district = teacher.district 

Again, this query; when compared to the previous 
query (without set-types), 1s easier for users to 
understand and to formulate due to the removal of 
Sree icit joins. The user 1S again immune from having to 
think about lower-level items such as attributes and 
common domains for joining purposes. 

It should be noted that the DBMS with the set- 
type attribute implemented took Significantly fewer 
physically coded lines to achieve the same result as the 
system without the implementation of a set-type. Bios t 
the students who live in the same district as the 
teachers who developed the course they study" required 
twelve lines £0 formulate versus seven with the 
implementation of a set-type. Likewise; "List the 
students who live in the same district as the teacher who 


developed the math course" required nine lines of code 


without a set-type implemented and five lines of code 


as 


with the set-type implemented. This is directly 
credited to the set-type attributes which eliminate the 
need for the nested SELECT operation. The SELE@a 
operation is what ties together attributes in a complex 
nested query. However; this operation 1S made obsolete 
in the set-type attribute example since the nested 
queries within a complex query do not call for one 
specific attribute value but require the value to be a 
member of a set of relations. 

Also; the query with the set-type attribute 
implemented 15 not so disjointed with respect to 
readability. The query reads more easily than the non- 
set-type attribute query which is very choppy ‘amd 
mentally difficult to tie together. Although the degree 
of nesting is the same in the two queries, the set-type 
query 1S more concise and manageable. This conciseness 
is due to the nested queries within the whole query being 
sewn together by common relations. While the 
conventional query 1s tied together by common attributes 
(thereby requiring the relations to be explicitly named 
within each nested query with a FROM clause), the set- 
type query 18S tied together by sets of relations. 
Conciseness 1S important as "query recognition' is also 
an issue that must be addressed. 

The bedfellow of "query formulation" may be 


considered "query recognition". Query recognition 1s an 
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important issue for a variety of reasons. Dienagity™ or 
unexpected data is retrieved from the database, the query 
must be examined to see why this occurred. if, sit. Canna t 
be determined what the query is trying to accomplish then 
a real problem exists. A serious deficiency in the make- 
up or design of the database may exist which goes 
Mmamoticed. Obviously; the more time and resources 
expended to determine the purpose of the query (1.e., 
what the query 1S doing or trying to achieve), the more 
costly the process is to the organization. This assumes 
that the answer to the query is "Caught" or determined by 
the user to be wrong. The ability to troubleshoot 
Queries is much easier if a set-type attribute 15s 
implemented in the system. As mentioned earlier; the 
readability issue coupled with the common-relation thread 
factor makes this possible. 

Query recognition is also important since new 
queries that must be formulated may be based on or 
structured around other queries which have already been 
formulated. The chance of a user formulating a correct 
query on the first try is very slim. If a template or 
Similar query is available to follow; then the task is 
made much easier. A user 1S more apt to recognize a 
query with a set-type attribute implemented because there 
are fewer physical lines required to evaluate or 


"debug". The trick here is finding that template or 


2) 


Similar query. The set-type implementation lends itself 
well to this purpose due to readability coupled with 
fewer lines to evaluate. 

To this point we Nave discussed implementation of 
set-type attributes in our relational schema (i1.e.,; 
instructor: SET _QF develop and classes : SET_OF study) 
with respect to query formulation. Now examine the 
implementation of a set-type with respect to system 
efficiency will be examined. A relation called TITLES 
(See Figure 3.5) consists of three attributes : NAME, 
YRS _ PRO, and YR_TITLE_ WON. The domains of the attributes 
are STRING, INTEGER, and INTEGER respectively. Again; 


without the implementation of a set-type, notice 


PRI EeS 


eos [se 





Figure 3.3. Relation levies 


ae 


the redundancy required to represent the data. Note also 


the each record only tells part of the story as each 


professional in this example, wona title in more than 
just the year indicated in a particular record. Now 
formulate a simple query that says, "Name the years, if 
any » in which Jones Ras won tournament titles." The 


query in S@L form would be: 
SELECT yr_title_won 
FROM titles 
WHERE name = "Jones" 

To retrieve the proper answer to the query, the 
system had to find the first record where the value of 
the name attribute was "Jones" and then retrieve the 
Gata value of the YR_TITLE WON attribute. In this 
Darticular relation, there are three records with the 
name "Jones" so all three records are retrieved with 


respect to the YR_TITLE_ WON attribute value. 


Consider another TrelatwvOmwealled TITLES2e (see 


Figure 3.6) which consists of the same attributes as the 
relation ee ees specifically NAME , VRowe ROS sama 
feast 1flLe WON. The domain of the NAME and YRS_PRO 


attributes remain the same (STRING and INTEGER 
respectively). However ; the YR_TITLE_ WON attribute in 
[eee SeBis a set-type attribute. Its domain 18S now 
Sseieeo- INTEGER. This 185 Similar to the "instructor" and 
"elasses" attributes in our previous schema, the only 


difference being their domains were sets of other 


ayy, 


relations. To extract all the “Vears 9 Jone=. wenn tee = 
then it can be done with the following query: 

See wa yr_title_won 

FROM titlese 

WHERE name = "Jones" 
This assumes that a high_level query language is able to 


retrieve all the elements of a set-type attribute which 


will be discussed in qreater detail later. 


Pieteese 
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Figure 3.6. Relation Titlese 











Although the query 1S, except the name of the 
relation exactly like the previous query; it One 
requires one record to be accessed. The set-type 
attribute saves the system from having to search for 
other records with the name "Jones". 

Other queries may also have an effect on system 
efficiency. For example, if the user wishes to know the 
total number of titles "Jones" has won, the system must 
search the relation for each entry with the name "Jones" 
and keep a running total of such entries. This assumes 


that there is an entry only if a title@ was won that 
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year. With the set-type attribute implemented; a query 
could be constructed as: 

Se LEGd count(yr_title won) 

re titlese 

WHERE name = "Jones" 
An assumption is made that the atomic items within the 
set of a set-type attribute are distinguishable. This 
query iS similar to the query with the system without a 
set-type implementation. Such a query would be: 

Sie | SOU nwa) 

FROM titles 

WHERE name = "Jones" 
The difference lies in the work the DBMS must do. The 
system "remains" at the appropriate record with the set- 
type attribute while the count operator totals the titles 
won. Without a set-type attributes the system must again 
sequence through the records. 

Had the domain of the YR_TITLE_ WON attribute 
included a null value (1.e., the relation included other 
attributes therefore arecord in the relation no longer 
implied a tournament title); the system would have to 
have done more work checking to see if in fact a title 
waS won. Each YR_TITLE_ WON attribute value would had to 
Pie @eeegmeaonecifically checked (i.e., the presence of a 


record with the value "Jones" would not be enough to 


assume a title win.) 
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A set-type attribute which 15 2 user-declared 
type adds tes vince, to relations and query 
formulations. For example, define a type as follows: 

type AFTER_YR = integer > 1976. 
Our scheme for welation Teele 1s 

titlese (name : STRINGS Yr S pre : INTEGER $ 

yr_title_won 

Se 1 OF SINVeSbER oe 

Now suppose the organization utilizing the database is 
only interested in titles won after 1976. The domain of 
YR_TITLE_ WON without a set-type implemented should now be 
integers greater than 1976). A domain change is desired 
so faulty updates will not occur and improves system 
performance since records will now be eliminated from the 
relation that do not contain applicable values. The 
relation would have to be modified as the tuples with 
values less than 1977 under the YR_TITLE_ WON attribute 
are-no. longer valid: This is harmful for two reasons. 
Firstly; the data that will exist in the relation 1s no 
longer the complete data for an individual player. 
Tournament victories prior to 1977 are no longer 
represented therefore the relation does not describe the 
real world accurately. Secondly; if the organization 
wishes to have this data available at a later time the 
relation must be modified again. 

The set-type attribute combined with the 


Previously user-declared type AFTER_YR precludes changing 


ies 


the relation rr ESee which does have YR_TITLE WON 
attribute values less than 1977. For example, if the 
years that "Jones" won titles after 1976 were selected, 
the query could be formulated as: 


SeeeCyT yr_title_won 


FROM titlese 
WHERE name = "Jones" AND 
yr_title_won IN arver vr . 


The query retrieves the desired data while the relation, 
which did not have to be modified, accurately represents 
me real world. 

Because the user 1s not required in the query to 
explicitly state the year (1.0.5 1976) after which 
"Jones" won titles, greater flexibility 1s achieved. 
Should the organization decide to change or impose 
different constraints on the data to be retrieved from 
the database, the DBA must only change the value of the 
user-declared type. For example, if at a later time the 
OrGanization 1s only interested in titles won Dy players 
after 1980, we simply change the user-declared type to 
"type AFTER_YR = integer > 1980."" The user need not 
necessarily be aware of this change. All the user must 
knNOW 1S how to formulate the query which is the exact 
process used previously. As long as the user 1S aware 
of the names of the user-declared types, he may follow 
the template of previous queries. A standard set of 
user-declared types could be implemented in queries by 
users with only the DBA having to know the underlying 
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details. For example, an organization may be interested 
in titles won before, during, and after a certain value 
which changes on a regular basis. Types could be declared 


(i.e@., BEFORE_YR, DURING, and AFTER_YR) and users could 


use in their queries. The regularly changing value or 
reference 15S of no concern to the users. Aside from the 
added flexibility; time and resources are saved since no 


changes to the database are necessary. 


D. QUERY LANGUAGE MODIFICATIQNS 

The implementation of set-type attributes in a 
relational schema requires examining the implications of 
such an implementation to query languages. The first 
normal form (INF) requirements cannot be relaxed without 
considering the effect such a relaxation will fave on our 
ability to retrieve, update; and store information in the 
Gatabase. 

As mentioned earlier, the 1NF requirement ensures all 
Gata within the relation will be nondecomposable (that 
iS, atomic 1n nature). THis means that each piece of 
Gata retrieved will be atomic or single-valued. 
Likewise, the database may only be updated with similar 
values such as those from the same domain. 

a Problems 

The need to address the issue can be clearly seen 


by retrieving the INSTRUCTOR attribute values from the 
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SCHOOL relation (See Figure 5.3). awamne an instructor of 
student 126" would normally be programmed as follows: 
SELECT instructor 
FROM school 
WHERE Sid = “dee. 


memever, if the attribute (in this case, INSTRUCTOR) is a 


set-type, all the data 1S retrieved. This presents 
problems if there are numerous values Wt Race eat he 
eetribute. In this instance, all three values (SMITH; 


JONES and DRAKE) would be returned as the answer to the 
query. 

A user-invisible relation is the most common 
Ppreposal for storage of set-type attributes. ane 
example, the values of the INSTRUCTOR attribute in the 
SCHOOL relation would be stored ina separate relation as 


shown in Figure 5.7. Naturally; a reference must exist 


Figure 3.7. User Invisible Relation 






which connects this separate, user-invisible relation to 
the tuple which "owns" it. 

Common proposals have been for the adoption of 
the familiar dot notation tG retrieve data from 
attributes which are reference attributes. it 1s 
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considered that a reference aw@erloute ts be an attribute 


which has an entity occurrence as its value (Ref. 10:p. 


eeone For example, the attribute CEASSES ae arc 
following schema would be considered a reference 
attribute: 


develop(name : STRINGs classes : STUDY) 
study (name : STRING; Geurse = Si 2 ING. 


Thus, aS expected, “develop.name" iS a string while 
"develop.classes" 15 an entity occurrence of type STUDY. 
To retrieve a course name the following notation must be 
used: 


SELECT develop.classes.course 
WHERE develop.name = "Smith". 


This notation eliminates the need for the user Ee 


develop a query with a complex join statement. As 
Previously mentioned, a join 15 not generally an easy 
concept for users to understand. If the dot notation had 


not been implemented, the query would Have been: 


= Se = Bag course 
FROM develop, study 
WHERE Wane =] omigeia as 
The dot notation query formulation iS more 
concise and easier to formulate for two reasons: Firsts 
an in-depth knowledge of a relations attributes, 


specifically their Semantics, 15 not needed Since a join 
is not explicitly performed. Although joining over the 
NAME attribute from the DEVELOP and STUDY relations is in 


this example semantically correct, there 1S no Quarantee 
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of semantic correctness in all cases. (heweaet motation 
prevents users from having to deal with low-level 
concepts such as attribute names and allows them to 
maintain the mental model of a set while formulating the 
query. 
eum Solutions 

The dot notation that 1s applied to reference 
attributes may be modified to handle set-type attributes. 
It 1s suggested that the following implementation enables 
users to retrieve from a set-type attribute a specific 
Value. 

Each atomic value within the set-type attribute 
(such as SMITH or JONES) should be stored within the 
user-invisible relation with a numerical value which 
uniquely identifies that value within the relation (see 
Figure 5.7). Notice that the values within the relation 
are ordered. If the user desires to know all the values 
of the set-type attribute, then the query is formulated 
in the uSual manner. However, if a certain or individual 
value 1s desired to be retrieved; then the query 
language must be modified so the user is able to retrieve 
that value. If a query such as "Name an instructor of 
student 126" is to be formulated, the query lanquage 
should be modified as follows: 

Selec? MastriuGecor .1 


FROM Sehne ell 
WHERE Sra = G2heo a 


Sy 


This notifies the system that the first value or element 
within the attribute is the item of interest. Note that 
the "1" an the query has no user-known link to the "1" 
stored next to the value "Drake" within the user- 
invisible relation that stores the instructor data and 
their corresponding integer values. The "i" in the query 
merely designates the first value within the set-type 
attribute. To retrieve the next value, the SELES] 
statement in the query would be "SELECT instructor.e” 

or “SELECT insStrletene eco signify an incremental 
increase of ane with respect to the location of the value 
within the set-type attribute. Thus; either of the two 
SELECT statements above, together with the FROM and 
WHERE statements, would retrieve the value of "Jones". 
After the value "Jones" is retrieved, another query with 
a SELECT statement such as "SELECT instructor.->” wows 


result in the value "Smith", while a query with a SELECT 


statement such as "SELECT instructor.<" would aqgaitm 
retrieve the value of "Drake". In this way we may 
traverse our way through the set-type attribute. The 


system must keep track of the last value from the set- 
type that was retrieved. 

The integer values stored with the instructor 
string values serve as a reference for the system to keep 
track of the relative positions of the values within the 


set. When the user types an attribute name followed by 
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">", 1t signals to the system to locate the value with 
the next highest corresponding integer value relative to 
the last integer value retrieved. This should not be too 
costly if the items in the user-invisible relation are 


ordered. 


= 6=—Cl SINS 
Joins were discussed in an earlier chapter therefore 
it 15 assumed the reader understands the basic concept of 
joins. 
1. Problems 

A criticism of relational systems has been that 
jOinS are very time-consuming and expensive. Improved 
techniques in the area of query optimization and inde>,rinag 
have been developed which have assuaged some of the 
opponents of relational systems. However; many opponents 
stand by the claim that regardless of the fact that 
optimization techniques have made joins less expensive to 
the system and its users;s; it is still an operation thet 
1S & thorn in the system’s side due to the difficulty 
users have in understanding joins. Furthermore, joins 
that semantically make no sense are still allowed by some 
relational DBMS products. 

tiie) jor tf Che Eeisoigs to make sense, must be 
executed on columns containing field values that are 
drawn from the same domain. However, this constraint 15 
not enough. A join made over two attributes with common 


oF 


domains such aS prices in one table and weights in 
another table is allowed But is not in keeping with the 
spirit of joins much less the semantics. An example will 
best illustrate the potential problem. Suppose a user 
wanted to do the following,s "List the districts where the 
employees have a credit of 3 dollars." The query in SOL 
form would be: 

SELECT “disteiet 

FROM teacher, employee 

WHERE employee.credit = 3. 
This query would cause the TEACHER and EMPLOYEE relations 


(Preum@me, S25) to be joined over the common attribute 


CREDIS The answers to the query would be CML (Carmel) 


TEACHER 








Erie eOVee 


PGi Gas oia. Relations Teachers and Employee 






70 


and MTY (Monterey). Although this may be true, there is 


no way to guarantee it. The user instructed the system 
what to do via the query. Being a procedural model, the 
system performs the join how it sees most efficient. The 


system 1S unable to semantically differentiate the two 
CREDIT attributes, therefore performs the join over the 
two different attributes, resulting in possible erroneous 
data. Such a join may be intended but is most likely 
not. There 1S no guarantee that a user will always 


recognize this mistake in logic. 


2a ess@lutions 
The following solutions to the problem are 
suggested. Users must have a basic understanding of a 
fom and how it works. It 1S not enough that they 
understand SQL alone. A user may be very well-versed 


with respect to SQL syntax issues but this does not 
prevent semantically inappropriate queries from end- 
users. Although a main benefit of the relational system 
is that users do not have to understand the underlying 
structure of the data to use the systems a join 1s really 
a simple operation that even naive users should 
understand. The MENU command mentioned earlier may be 
expanded to supply the user with not only the domain of 
each attribute, but also a brief Narrative comment about 
the domain. For example, in response to a user’s command 


of MENU TEACHER the system’s reply would be: 


wt 


Relation = TEACHER JOrmeeer aieus 


Tee alphanumeric teacher identification # 
CeURSe String course teaches 

CiStZ es string size of class 

CRE DIM integer course credit hours 
Dustin ca string city wiheremt aan 


This option will ensure that the user understands that 
the CREDIT attribute contained in the TEACHER relation 
would semantically be inappropriately joined with a 
CREDIT attribute containing an employee’s credit balance 
in en EMPLOY eGeemel ation. The system will do only what it 
is told . to doy by the) Use wer This resource may prevent 
the user from issuing a querys which the system would 
Gladly accept, that may be very harmful to himself by 
retrieving inaccurate data. 

To this point we have burdened the user with 
making sure the joins have semantically been sound. A 
more active role by the DBMS is the most effective way to 
achieve the desired result. An interactive dialogue with 
the user would prevent some inappropriate joins from 
GEeCuUur img: For example, a join between the TEACHER and 


EMPLOYEE relation would result in the system response: 


Joins relations : TEACHER and EMPLOYEE 
Join attribute : CREDIT 
Ales eg oie 4 TEACHER GCOUrSe Cred tt aces 


EMPLOYEE employee’s credit Balance 

Proceed with Join (Y/N)? > 
RF user with any experience in formulating queries should 
recognize that such a join 185 not semantically correct. 


To further help the user, the system should respond to a 


ie 


negative answer at the prompt with other possible 
relations, if any exist, that will provide the user with 
a possibly semantically correct eae For example, if 
the attribute CREDIT appears on another relation within 
the database the system should inform the user of this in 
such a manner: 
CREDIT appears on 3 files: TEACHER 
ey ea 
BANK 
PAOOSe ar bees  DESTRED: 


Here it lis assumed that there 1s yet a third relation 


within the database with the attribute CREDIT (namely 


BANK). 

Although the system dialogue with the user does 
take time, the benefit of a correct answer to his query 
outweighs this added time. It 1s assumed that a very 


experienced user with the database would not need to 
employ this feature. However, if a query receives an 
unexpected answer ; any user will be able to troubleshoot 


the cause of the suspected problem. 


aoe bY¥re CHECKING 
The real source of the join problem is the lack of 
strong data type checking. 
Ve Problems 
Database designers are unable to declare their 
Own data types other than system-defined data types. 
THis leads to the strong possibility of semantically 


rec: 


incorrect joins. the lack of strong cata type chec awe 
may also cause a loss of data integrity within the 
database. Examples will best illustrate this 
shortcoming. In the STUDENT relation; the domain of 
values for the YR attribute is the integers 1; 2; 3 and 
G. In the TEACHER relation, the domain of values for the 
CREDIT attribute is also the integers 1; 2; 3 and 4. 
Although the two attributes have the same domains joining 
the two relations over those two attributes would be 
semantically incorrect (years in school is hardly equal 
to Course EGredits)- 

Without a user-declared data type; constraints 
On updates are not as easily enforceable and lack of data 
integrity results. AS an example, assume a relation 


Called FLIGHTS exists in Our database 


me SMS (e) 


Ee. DEST CAPACITY TIX# 


[273 «least ODD == 


WEST EVEN” == 





Figure S.9.- Relation Fiiehee 


Also assume that a travel agency;s By convention; assigns 
odd ticket numbers to those passengers travelling East 
and even ticket numbers to those travelling West. The 
domain of the TIX# attribute in the FLIGHTS relation 15 
currently positive integers. Due to weak typing, an 
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agent may assign a wrong ticket number may be assigned to 
a passenger thus compromising the integrity of the 
database. 
ean So lUtIONS 

The implementation of a command by which users 
when creating a new relation; may declare their own data 
types is suggested. For example, a new table is usually 
created using a command such as BUILD in the following 
manner: 
BMTLD FLIGHTS(org=Dc, dest=Sc, capacity=1li, tix#=11). 


A relation called FLIGHTS is created with Tour 


Mee 2OUtTeS: org (origin)-—- with five or less characters, 
dest Gdestinationy=—— with. .five or less characters, 
Capacity--with a 1-byte integer, and tix# Caraket 
number), with a Loy eS integer. With a new command 
called TYPE, the user creating the new relation may 


declare a set type in such a manner: 


PRED FiabGHIS(org@eee2es= dest |= . Capacity seme 
Basst  - G@ineer lon) 
TYPE direction = even(integer) if s;travelling west 


eastoflord,dest) 
else odd(integer);travelling east 


The declaration of type-direction assumes a predicate 
“eastof" has been defined. When the statement 
"eastoflorg,dest)”" is executed, the system must check the 


Value of "org" against the value of "dest". A database 
of facts concerning the locations of all possible origins 


and destinations 1s assumed to be searched by the system 


ts 


when a new record is inserted in the FLIGHTS relation or 
when a record 1s updated. A specific example such as the 
preceding one serves to show how challenging it is to 
implement a command such as [TYPE into a system. The 
template of a type declaration is : 

TYPE typename = code using a high-level language 
with standard operators within 
that lanquage 

Not all TYPE declarations are so challenging. For 

example, if a current domain was currently over positive 
integers and the desiqner or DBA wanted to restrict the 
domain to integers greater than eighteen, the 


declaration would be: 


BUILD YRS(name : @€0c; state : 3c3 age : eighteen) 
TYPE eighteen = integer >= 18. 


If a user of the system attempts to update or enter a 
value within the age field of a record that is less than 
eighteen, the system should provide a signal of this 
illegal action. For example, the system may reply: 


USER ENTRY FOR AGE OUT OF LIMITS 
For help type "MENU YRS" or "EXPLAIN AGE" 


If the command MENU YRS 1s entered, the system would 
respond with: 


Relation = YRS 


NAME string 
STATE string 
AGE eighteen 


The user now sees that the domain, on he sooo eel emu 


a type called eighteen. A more ‘direct ~route tes tne 
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problem would be to furnish the user with a command that 
describes user-declared types. We suggest the command 
"EXPLAIN typename" to be implemented for this purpose. 
The user enters "EXPLAIN AGE" to which the system 
replies "TYPE EIGHTEEN :value entered must be greater 
than or equal to eighteen." The user 165 now aware of why 
the previous record was unable to be entered or updated 
samay take corrective action. 

It 15 obvious that the option of user-declared 
data types 15s costly to users with respect to time. The 
system has much more work to do because of the added 
constraints imposed by the user. The user benefits from 
the stronger data typing by preventing a compromise of 
the integrity of the database. Most would agree that 
Seectracy Of the information upon which the organization 
and/or people base their decisions 1S more valuable than 
having not so accurate information a little sooner. 
There continues to be considerable research effort in the 
area of doing as much type-checking as possible with as 
little run time cost as possible. bRetwmeee: 0 = 459). Unt! 
research in this area proves fruitful, the implementation 
of the system-user dialogue coupled with the 
implementation of the previously defined commands 15 


suggested. 


var 
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Access paths must be mentioned when critiqueing 
relational database systems. They play amajor role in 
the joining of relations and also greatly affect the 
overall performance of the system. An access path 
involves the order in which records are read. An access 
path also involves whether or not indexes are used and 
the decision of whether to read a record from file one 
and compare the values with file two, or to first read a 
record from f11e€ two and then compare the values with 
file one. For example, if a user wanted to find all the 
students who take the math course (assuming all students 
take every course taught in their— Gdistpier, then the 
STUDENT and TEACHER relations must be joined over the 
common attribute Distt cic Depending upon the access 
path, a record would first be read from the STUDENT 
relation and then compared with each record in the 
TEACHER relation or a record would first be read from the 
TEACHER relation and then compared with the records in 
the STUDENT © relation. In this particular example, 
because a selection is to be made (the "math" value from 
the COURSE Attusaeute. it would be wise to make the 
selection first. This would reduce the number of record 
instances to be considered for the join. [t sheularee 


remembered that tables are related by value as opposed to 


7s 


Pointers and that access paths are determined by the 
system and not by the programmer. 
1. Problems 

In hierarchical and network systems, access paths 
are predefined in the data structure. This 16 taken care 
of by the DBA. The relational systems differs in that 
there are no predefined paths in the data Structure as 
seen by the user. Many different paths may exist because 
access 1S accomplished by the matching of field values. 
There are pluses and minuses with these differences. a 
Hierarchical or network diagram, where access paths are 
predefined and explicitly shown, will in a quick glance 
provide an immediate understanding of many complex 
mercerreitationships. CRef. 14:p. 37) Gimemtne other hand. 
irony exists as the relational software, which works to 
buffer the user from access considerations, Peehror.s 
users from fully and quickly digesting how the tables are 
interrelated CRef. 15:p. 48). The gains made in the 
system by reduced procedurality may be lessened as there 
are no predefined access paths of which the user may take 
advantage . 

We have previously mentioned how time-consuming 
and inefficient joins may  0be. In fact, when the 
relational model was introduced by Codd, it was thought 
that the inability to choose efficient access paths when 


answering queries involving joins on random collections 
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of files would keep the relational system from ever 
becoming very practical. CRef. 6:p. Ver Optimization 
techniques have alleviated the problem. For instance, 
the System R (IBM’s) prototype for SQL/DS and DBe) CRef. 
Ge pe 94] query processor evaluates different 
possibilities for using or not using indexes and for 
making joins in one sequence or another sequence. The 
designers state that the path that really is least 
expensive to the system 1s chosen most of the time. 
a. Semutiams 

The following procedures are suggested for 
implementation in relational database management systems 
with respect to joins and access paths. 

Attributes over which the relations are joined 
may be from the same domain but are semantically worlds 
apart. The database designer or DBA should set up a file 
which lists relations and the attribute name they have in 
EC OMmMe ns The pair of relations should only be included in 
the list if joining over the like attribute would be 
semantically inappropriate. Bey example, relations 
EMPLOYEE and TEACHER would be included in the file with 
the common attribute of CREDIT separating the two (see 
EA guree ou lOoe Any time a user attempts a join; the 
indexed file 1s searched to see if the join should be 


allowed. If the system determines the join is allowed; 
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the system proceeds with the join. If the two relations 


and the joining attribute is in the file, the system 


COMMON 
Ri ATTRIBUTE Re 
EMPLOYEE CREDIT TEACHER 


Figure 53.10. Relations with A Common Attribute. 


should alert the user. For examples "JOIN NOT ALLOWED. 
SEMANTICALLY INCORRECT.” There should be an option for 
the user to find out the other relations which contain 
the common attribute. A command such as "FIND attribute” 
would suffice. For example, the command "FIND CREDIT" 


would result in the system responding: 


TEACHER course credit hours 
SMPEGYEE credit balance 
The file 15, in effect; a Boolean type operator which 


determines whether or not the relations may be joined. 
The database designer or DBA must be careful when new 
relations are added to the database. Furthermore, each 
operation on a relation creates a new relation that must 
me taken imto account. An algorithm must exist that 
ODlaces the attribute already contained ina relation in 
the file along with the other relations thet contain the 
attribute. 

Along with the prevention of improper joins, the 


system should notify the user when there is no way to 
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make @ join. A system response to a join unable to be 
made may be: 

NO JOIN ABLE TO BE MADE. TYPE “FIND attribute nane. 

FOR OTHER POSSIBLE RELETIG@NS With DES TRE aie einG 

ATLIRIT BE Re 

If there exists more than one way to make joins, 

the answer should be arrived at by the system via the 
shortest path. This 185 an 158suUe that the database 
designer must deal with. However, if the designer is 
unable to avoid a situation whereby there is more than 
one way to answer the same query, the user should be 
aware of this. As mentioned earlier, a user-dialogque 
where the system provides the user with the chance to 
pick the relations over which the join will be performed 
should be available. For example; 2a response to a query 
that may be answered by the joining of difftegeng: 
combinations of relations may be: 


RELATIONS CHOSEN FOR JOIN ARE : TEACHER AND EMPLOYEE 
ARE YOU SATISFIED WITH THE RELATIONS 2@GSei 7a 


An affirmative answer would ainstruct the system to 
proceed with the join, a negative answer should instruct 
the system to provide the user with the other relation(s) 
that may be chosen for joining. 

Relations which are semantically joined 
improperly is not the only source of problems with joins. 
R user without a security clearance must not be able to 
retrieve sensitive data from the database if he 
formulates a query which causes a relation he has 
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authorization to access to be joined with a relation 
which contains sensitive data. 

Whether or not the relation 1s sorted and the 
presence (or absence) of indexes on the join attributes 


also plays a major role in the performance and efficiency 


Sa. joins. The importance of access paths, which so 
greatly affect the performance of joins, and hence the 
relational DBMS performance cannot be understated. It is 


for this reason that the previous recommendations place a 


heavy emphasis upon the semantics of any joins performed. 


Ss 
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Database processing technology 1s a rapidly growing 
field with the future promising continued growth. 
Organizations are becoming increasingly dependent upon 
their database processing capabilities in order to remain 
competitive. Organizations which do not effectively and 
efficiently utilize their database run the risk of not 
measuring up to their competitors. Due to this increased 
importance, database management systems must be as 
helpful, logical and semantic as possible to users. The 
recommended solutions to the problems detailed in the 
previous chapter have all aimed to increase a system’s 
semantic weaknesses. Query formulation is also aided by 
the recommendations. 


The MENU and VIEW commands better familiarize end=— 


users with the relations which, in total, create the 
database. Any questions concerning any relations may 
Quickly be resolved. A set-type attribute allows 
relations to more accurately model the real world. All 


the facts are contained in a tuple and storage space is 
saved while redundancy is reduced. Query formulation is 


aided as users are no longer forced to think in terms of 


the physical database structure. Query language 
modifications allow individual values of set-type 
attributes to be selected. Joins are made more 
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semantically clear through an interactive dialogue 
between the system and user. User declared data types 
provide greater database integrity and flexibility. 

These extensions to a relational DBMS serve to 
enhance a system’s semantics and directly aide in query 
formulation. This in turn provides for a user-friendly 
system which will promote accuracy and efficiency while 
using one of the organizations most valued resources, its 


database. 


Sa 
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